The Condorcet project: indexing scientific documents

نویسندگان

  • Bas van Bakel
  • Reinier Boon
  • Nicolaas J.I. Mars
  • Jeroen Nijhuis
  • Erik Oltmans
  • Paul E. van der Vet
چکیده

This paper presents Condorcet, a domain-specific prototype indexing system for tens of thousands of documents covering two scientific domains: engineering ceramics and epilepsy. The development corpus consists of 800 documents taken from one year volumes of two scientific journals. Condorcet takes a controlled-term approach to. The index process makes intensive use of linguistic knowledge. The paper discusses how principle-based natural language processing strategies and structured knowledge sources are used in a semi-automatic, controlled-term

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining linguistic and knowledge-based engineering for information retrieval and information extraction

Controlled-term indexing (the method of choice for multimedia collections and still very popular for purely textual material), appears an expensive solution because it takes huge resources and manual indexing. It is not possible, however, to perform a well-founded asessment of various approaches to information retrieval. We discuss ways to improve controlled-term indexing and illustrate these b...

متن کامل

Large-Scale Semantic Indexing of Biomedical Publications

Automated annotation of scientific publications in real-world digital libraries requires dealing with challenges such as large number of concepts and training examples, multi-label training examples and hierarchical structure of concepts. BioASQ is a European project that contributes a large-scale biomedical publications corpus for working on these challenges. This paper documents the participa...

متن کامل

QCT and SF services in Torii: Human Evaluations of Documents Benefit to the Community

This paper describes two services of the Torii portal dedicated to the High Energy Physics research community, and developed within the context of the TIPS European project. These services both relate to the reuse of evaluations performed by humans on scientific publications. The first one, called QCT (Quality Control Tools) aims at collecting human detailed evaluations of documents in order to...

متن کامل

A Survey of Indexing and Retrieval of Multimodal Documents: Text and Images

A document conveys information using multiple modalities, including text, layout/style and images. For example, journal articles usually have figures to illustrate experimental results, and the title in a journal article usually has a different font size than the body text. Indexing and retrieval using only text is the traditional way of IR (Information Retrieval). With the development of the I...

متن کامل

Concept Mining for Indexing Medical Literature

This article addresses the task of mining concepts from biomedical literature to index and search through this documents base. This research takes place within the Telemakus project, which has for goal to support and facilitate the knowledge discovery process by providing retrieval, visual, and interaction tools to mine and map research findings from research literature in the field of aging. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007